AUTOMATED STELLAR SPECTRAL CLASSIFICATION AND PARAM- 
ETERIZATION FOR THE MASSES 



Ted von Hippel, Carlos Allende Prieto, & Chris Sneden 
Department of Astronomy, University of Texas, Austin, TX 78712 

o 
o 

(N 

ABSTRACT: Stellar spectroscopic classification has been successfully automated by a number of groups. 
Automated classification and parameterization work best when applied to a homogeneous data set, and thus 
these techniques primarily have been developed for and applied to large surveys. While most ongoing large 
oo : spectroscopic surveys target extragalactic objects, many stellar spectra have been and will be obtained. We 

briefly summarize past work on automated classification and parameterization, with emphasis on the work 
done in our group. Accurate automated classification in the spectral type domain and parameterization in 
the temperature domain have been relatively easy. Automated parameterization in the metallicity domain, 
formally outside the MK system, has also been effective. Due to the subtle effects on the spectrum, automated 
classification in the luminosity domain has been somewhat more difficult, but still successful. In order to 
extend the use of automated techniques beyond a few surveys, we present our current efforts at building 
a web-based automated stellar spectroscopic classification and parameterization machine. Our proposed 
machinery would provide users with MK classifications as well as the astrophysical parameters of effective 
temperature, surface gravity, mean abundance, abundance anomalies, and microturbulence. 
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1. A BRIEF HISTORY OF AUTOMATED CLASSIFICATION 
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Current or planned large-scale surveys, such as the Sloan Digital Sky Survey (York et 
al. 2000) or the GAIA mission (scheduled for launch around 2011), have led to increased 
interest in automated spectral classifiers (e.g., Bailer- Jones 2000). There are many other 
reasons to develop automated classifiers, not the least of which are the homogeneity of the 
results and the repeatability of the process. Automated spectral classification of stars goes 
back decades. For instance, in an early attempt Jones (1966) fit a few major stellar lines 
and correlated these indexes with MK type (Morgan, Keenan, & Kellman 1943). Malyuto 
& Shvelidze (1994) later developed this technique further. Unfortunately, the line fitting 
technique suffers from the disadvantage that one has first to know the approximate stellar 
type before determining which lines to fit, otherwise very different features will be found 
at the same wavelengths. Kurtz (1983; see also LaSala 1994) developed a minimum vector 
distance technique that matched spectra to a library of standards weighting the comparison 
to different spectral regions for different types of stars. The minimum vector distance 
technique has had some success - classifying stellar spectral types to within <r=2.2 spectral 
subtypes - but refining this technique is cumbersome since the weighting vectors need to 
be carefully established, yet they vary as a function of spectral type and luminosity class. 
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In the middle of the last decade four independent groups (Gulati et al. 1994; von Hippel 
et al. 1994; Vieira & Ponz 1995; Weaver & Torres-Dodgen 1995) began to successfully apply 
artificial neural networks (ANNs) to the spectral classification problem. Neural networks are 
trained to yield classifications that are identical to those previously assigned, and thus have 
the advantage that the ANN builder need not become a classification expert, but rather can 
rely on the true experts in the field, via their many previous classifications. For example, 
von Hippel et al. built an ANN using the objective prism spectra and classifications of Houk 
(1982, and references therein), who had at that time already classified more than 10 5 stars. 
This first generation of neural networks were applied to low resolution ultraviolet (Vieira & 
Ponz 1995) or optical (the other three studies) spectra and achieved <7=0.6 spectral subtypes 
and a=0.35 luminosity classes for A stars and returned E(B — V) (Weaver & Torres-Dodgen 
1995), or a < 2 subtypes over the broad range of 03 to M4 stars. These studies validated 
the ANN approach and indicated its tremendous potential. 

The second generation of ANN spectral classification studies (Weaver & Torres-Dodgen 
1997; Bailer-Jones, Irwin, & von Hippel 1998; Singh, Gulati, & Gupta 1998; Weaver 2000) 
focused on increasing sample size and moving on to two-dimensional classification. At this 
point spectral type classification became very good, with <r=0.5-0.7 subtypes. Luminosity 
classification quality varied from a ~ 0.3 luminosity classes (Weaver &: Torres-Dodgen 
1997) to a statistically reliable luminosity classification for dwarfs and giants, though not 
sub-giants (Bailer- Jones et al. 1998). Interestingly, Weaver (2000) also showed that he could 
provide two dimensional classification for both components of artificial binaries! 

A few ANN studies (Bailer-Jones et al. 1997; Snider et al. 2001) moved from MK 
stellar classification to parameterization of astrophysical parameters (T c fj, log(g), [Fe/H]). 
Although the philosophy of classification and parameterization are different, they strive 
to serve the same community. Classification seeks to place any program star within the 
framework defined by a series of standards. Since the MK classification system is so widely 
used and the connection to stellar parameters such as absolute magnitude, surface temper- 
ature, and mass, are in general well known, this has been a productive route for studying 
individual stars, local stellar populations, and Galactic structure. Stellar parameterization 
seeks to skip the initial step of MK classification and directly determine atmospheric pa- 
rameters. The cost is that the results are model dependent, but in many parts of the HR 
diagram model spectra look very much, though not exactly, like real stellar spectra. In ad- 
dition, model atmospheres can be easily constructed for subsolar metallicity, and therefore 
these can be applied to spectra which would not be possible to classify on the MK sys- 
tem. Bailer-Jones et al. (1997) passed their objective prism spectra through ANNs trained 
on stellar atmosphere models and derived a detailed mapping between MK classifications 
and the Kurucz (1979, 1992) model atmosphere set they used (as implemented with the 
program SPECTRUM by Gray h Corbally 1994). They also reported that the mean metal- 
licity in the solar neighborhood, as represented by Houk's (1982) objective prism spectra, 
is slightly subsolar, at [Fe/H] = —0.2. Snider et al. (2001) turned the problem around and 
trained ANNs on real stellar spectra using atmospheric parameters previously derived from 
fine abundance analysis work in the literature, now in three-dimensional parameter space. 
They achieved a(T cS ) » 150 K, a(log(g)) » 0.33 dex, and <r(Fe/H) » 0.2 dex. 

2. A FEW LESSONS LEARNED 

Here we offer a few comments on lessons we have learned in applying ANNs to the 
problems of spectral classification and parameterization: 
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• Classification in spectral type and parameterization in T e g- are easy, the ANNs have 
little trouble finding a good global solution, and the results are generally precise to 
less than a spectral subtype or 200 K. 

• Luminosity classification and log(g) parameterization are possible, but more difficult. 
Both require spectra with an adequate combination of resolution, wavelength coverage, 
and S/N. This spectral quality is just achieved with classical MK objective prism 
resolution and wavelength coverage. 

• ANNs are best suited to stellar classification/parameterization when a single wave- 
length range and resolution are used for a particular ANN. 

• If low S/N spectra are used, besides the higher random errors, ANNs may make 
systematic errors unless they have been trained on low S/N spectra (Snider et al. 
2001). 

• Supervised ANNs treat spectra as patterns with a known correlation between those 
patterns and answers, and attempt to learn that relationship. The better one can 
homogenize the training data so that the ANN does not find spurious correlations 
between the input catalog and the answers the better results one achieves. Larger 
catalogs always help in this regard, as do uniform data sets. Spurious correlations 
caused by real astrophysics can also be a problem. As an example of this phenomenon, 
the Houk catalog is magnitude- limited with Vn m it = 10-11. This magnitude limit 
creates a correlation between spectral type and luminosity class, i.e., the catalog 
contains mostly early type dwarfs and late type giants. Without proper scrutiny one 
might believe a trained ANN has achieved true luminosity classification using such a 
catalog when the ANN could have learned to classify luminosity statistically based on 
spectral type. 

• Principle Component Analysis (PCA) can be used to compress stellar spectra or to 
remove spurious signals (e.g., Storrie-Lombardi et al. 1995). PCA also bears a strong 
resemblance to ANNs (Lahav 1995), and is a good pedagogical tool to gaining a 
heuristic understanding of ANNs. By creating a series of vectors, a linear combination 
of which will recreate any star in the library, PCA recasts a stellar spectral library in 
much the same way as the hidden nodes in a single hidden layer ANN. 

• Spectral classification errors are a function of spectral type, based largely on the 
number of examples and variance within a given type or range of types. For example, 
Bailer-Jones et al. (1998) found a(SpT)=0.5 for B3 to AO stars and cr(SpT)=0.8 for 
F3 to Gl stars. The lower errors for the B and A stars are probably the result of 
reduced sensitivity of their spectra to abundance differences. 

• It is easiest to build multiple ANNs, each specializing in a specific dimension, when 
solving multi-dimensional spectral classification or parameterization problems. Not 
only are the ANNs more likely to converge on a good global solution, but less data are 
required for this approach. For example, Snider et al. (2001) built ANNs specializing 
in each of T e s , log(g), and [Fe/H]. Weaver (2000) has also found it helpful to use one 
ANN for initial rough classifications, followed by specialist ANNs, trained on a limited 
spectral type ranges, to refine the classifications. 
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3. PROPOSED CLASSIFICATION AND PARAMETERIZATION FOR THE MASSES 

Can we build automated spectral classifiers for general use? For some time Bob Gar- 
rison has pointed out that we could build stellar classifiers for particular spectrographs. 
Users of such spectrographs might have a near real-time reduction pipeline, immediately 
following which they would receive a stellar classification. This is certainly possible. In fact, 
if the data were obtained in a standard manner, the only reduction steps required prior to 
ANN classification would be spectral extraction and wavelength calibration. 

In practice, neither we nor, to the best of our knowledge, any other group has taken 
this approach. The difficulties are not technical, but rather the time-consuming nature of 
building multiple such classification machines for the many possible spectrographs in use 
by stellar spectroscopists. Certainly, if a stellar spectroscopic survey of sufficient size were 
to be undertaken which would create a uniform data set of sufficient quality, we and others 
would be motivated to build a tailor-made stellar classification or parameterization machine 
for that instrument /survey. 

We propose instead a thematically related approach. Instead of building ANN classi- 
fication/parameterization machines for particular spectrographs, we propose to build such 
machines for particular combinations of wavelength coverage, resolution, and S/N, and make 
these available for use via the web. It would be up to the user to process their data onto a 
linear flux and wavelength scale at one of the resolutions and wavelength ranges supported 
by our web site algorithms. Users would upload their spectra, run the classification or 
parameterization ANNs, and receive a spectral type and luminosity class and/or the stellar 
astrophysical parameters, along with the associated uncertainties. 

We hope to develop such a tool first by beginning with stellar parameterization based 
on model atmospheres. Our entire approach would be modular and would initially support 
a single resolution and wavelength range, while covering the parameter range 4500 < T c g < 
8000, 2 < log(s) < 5, -4.5 < [Fe/H] < +0.5, and < microturbulence < 1/2 the stellar 
rotation break-up velocity. Our initial resolution and wavelength range have not been 
finalized, but would probably be R ~ 2000 and 150 A around H/3, respectively. This would 
allow anyone with higher resolution spectra or spectra with a broader range of wavelengths 
to take advantage of our first automated parameterization algorithms. As we move to 
support a wider range of spectral resolutions and wavelength coverages we also intend to add 
modules to increase our effective temperature range to T e g- = 50,000, decrease our surface 
gravity range to \og(g) = 1, include different relative O, Mg, and Ca abundances, and begin 
spectral classification on the MK system for stars of near solar metallicity. Eventually we 
hope to push the stellar parameters into the M, L, and T dwarf regimes. We recognize that 
model atmospheres are never perfectly accurate and that they improve with time. From 
time to time, where meaningful advances have been made for a particular range of stellar 
atmospheres, we will upgrade our parameterization modules. 

Our anticipated users are spectroscopists doing fine-abundance analysis who want a 
starting point or a sanity check on their result, those conducting surveys who need clas- 
sifications or parameterizations for statistical or pre-selection purposes, and those wanting 
independent determinations of the classifications/parameters for their program stars for a 
wide variety of studies. 
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